add model: Qwen3-Omni-30B-A3B-Instruct#18404
Closed
TrevorS wants to merge 7 commits intoggml-org:masterfrom
Closed
add model: Qwen3-Omni-30B-A3B-Instruct#18404TrevorS wants to merge 7 commits intoggml-org:masterfrom
TrevorS wants to merge 7 commits intoggml-org:masterfrom
Conversation
- Add QWEN3OMNI_TALKER architecture enum and string mapping - Define tensor name mappings for Talker (20-layer MoE transformer) - Add Code Predictor tensor mappings (5 layers + 15 LM heads) - Add Code2Wav vocoder tensor mappings (pre-transformer + upsample + decoder) - Implement Qwen3OmniTalkerModel class with nested config extraction - Support ModelType.TALKER for speech synthesis pipeline
- Add LLM_ARCH_QWEN3OMNI_TALKER enum and architecture name mapping - Define Talker tensor keys (transformer, Code Predictor, Code2Wav) - Add n_thinker_hidden to llama_hparams for cross-model coupling - Implement qwen3omni_talker.cpp graph builder with MoE routing - Support 20-layer Talker transformer with 128 experts per layer - Implement 5-layer Code Predictor with 15 parallel LM heads - Build Code2Wav vocoder graph (pre-transformer + ConvNeXt upsample + HiFiGAN decoder) - Add CMakeLists.txt entry for qwen3omni_talker.cpp
- Implement mtmd-tts.cpp CPU inference for Talker + Code2Wav pipeline - Add mtmd-tts-gpu.cpp CUDA-accelerated graph execution - Implement mtmd-tts-code2wav.cpp HiFi-GAN vocoder with 16 VQ codebooks - Support sliding window attention in Code Predictor - Add RoPE position encoding for autoregressive code prediction - Implement ConvNeXt upsampling and multi-resolution STFT discriminator - Update tools/mtmd/CMakeLists.txt to build TTS components
- Implement qwen3omni-audio.cpp for 32-layer Whisper-style audio encoder - Add Qwen3OmniMmprojModel class in convert_hf_to_gguf.py for dual encoder export - Support audio encoder flatten + layer norm projection - Update clip.cpp and clip-impl.h for QWEN3OMNI_AUDIO projector type - Add audio config normalization for Whisper-style naming - Update tools/mtmd/models/models.h with Qwen3-Omni audio model registration - Fix whisper-enc.cpp compatibility with Qwen3-Omni audio pipeline
- Add QWEN3OMNI_VISION projector type constant - Implement Qwen3OmniVisionMmprojModel class for vision encoder export - Update qwen3vl-moe.cpp with vision projector support - Add deepstack layer handling for Qwen3-VL architecture - Update clip-model.h with QWEN3OMNI_VISION enum - Support nested thinker.visual.* tensor prefix handling
- Wire TTS pipeline into mtmd-cli.cpp for end-to-end text→speech - Add --tts flag for Talker model loading - Integrate mtmd-tts.h API into mtmd.cpp - Update mtmd-audio.cpp and mtmd-audio.h for audio encoder handling - Add TTS tests to tools/mtmd/tests.sh - Support both CPU and GPU inference paths
- Fix EOS token handling for Qwen3-Omni chat format - Add GPU preparation for Code Predictor inference - Update llama-context.cpp and llama-context.h for TTS context - Add llama-cparams.h changes for Talker cache configuration - Update llama-impl.cpp with Talker-specific helpers - Add include/llama.h API extensions for TTS
Contributor
|
We don't accept fully AI written PRs in mtmd. The time it takes for contributors to generate such code is much less than the time it takes me to optimize and nitpick them. AI often, if not always generates sub-optimal ggml code.
Read the contribution guide: break down changes into smaller part, smaller PRs. For mtmd: don't use AI |
This was referenced Dec 27, 2025
Contributor
Author
|
closing in favor of an incremental approach, starting with #18420 |
|
Hi @TrevorS, would it be possible to add this model to the whisper.cpp repository? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds qwen 3 omni moe
What's working:
What's not built and/or untested:
Examples
documentary_test.wav
AI Disclosure
100% of the code in this PR was written by AI.
Branch prior to clean up and rebase: https://github.com/TrevorS/llama.cpp/tree/feature/qwen3-omni-backup-20251226
I don't want to waste anyone's time -- please feel free to tell me to close my PR and go away! :)
Otherwise, I'm happy to work on doing what I need to in order to get any or all of this code merged.